04 Resampling

Right click to download this notebook from GitHub.


In the case of the images that we have loaded so far, all the data have the same resolution (30m). In the previous tutorial we saw that it is straightforward to align these datasets even though they cover slightly different areas. In some cases though the resolution of the image is different. This is the case for band 8 (the pancromatic band). The resolution of that band is 15m. This section will demonstrate aggregation (down-sampling) and interpolation (up-sampling). In practice, aggregation is much more common.

In [1]:
import intake
import numpy as np
import xarray as xr

import holoviews as hv

import cartopy.crs as ccrs
import geoviews as gv

import hvplot.xarray

hv.extension('bokeh', width=80)

We'll be using the datashader operations rasterize and regrid to handle our multidimensional regridding.

In [2]:
from holoviews.operation.datashader import regrid, rasterize
from datashader import transfer_functions as tf, reductions as rd

Recap: Loading data

In [3]:
cat = intake.open_catalog('../catalog.yml')
l5_da = cat.l5().read_chunked()
l8_da = cat.l8().read_chunked()
crs = ccrs.epsg(32611)

Compute NDVI

Now we will calculate NDVI for each of these image sets.

In [4]:
NDVI_1988 = (l5_da.sel(band=5) - l5_da.sel(band=4)) / (l5_da.sel(band=5) + l5_da.sel(band=4))
NDVI_2017 = (l8_da.sel(band=5) - l8_da.sel(band=4)) / (l8_da.sel(band=5) + l8_da.sel(band=4))

Aligning the data

In [5]:
NDVI_by_year = xr.concat([NDVI_1988, NDVI_2017], dim=xr.DataArray([1988, 2017], dims=('year'), name='year'))
NDVI_by_year
Out[5]:
<xarray.DataArray (year: 2, y: 7941, x: 7961)>
dask.array<shape=(2, 7941, 7961), dtype=float64, chunksize=(1, 5, 256)>
Coordinates:
  * x        (x) float64 2.424e+05 2.424e+05 2.425e+05 ... 4.812e+05 4.812e+05
  * y        (y) float64 4.188e+06 4.188e+06 4.188e+06 ... 4.426e+06 4.426e+06
  * year     (year) int64 1988 2017

Select region of interest

We'll use the area around the central point as the Region of Interest (ROI). In this case we'll use a 30 km box around the center point.

In [6]:
x_center, y_center = crs.transform_point(-118.7081, 38.6942, ccrs.PlateCarree())
buffer = 1.5e4
In [7]:
xmin = x_center - buffer
xmax = x_center + buffer
ymin = y_center - buffer
ymax = y_center + buffer
In [8]:
ROI = NDVI_by_year.sel(x=slice(xmin, xmax), y=slice(ymin, ymax))
In [9]:
p = ROI.hvplot('x','y', col='year', crs=crs, shared_axes=True, width=400, height=300, cmap='viridis')

Aggregation

We'll define a new resolution that is visibly different from 30m.

In [10]:
res = 1e3

Just to make things pretty and as a sanity check, let's turn the colorbar back on for both plots and set the width of the first plot slightly higher to account for the extra axis that is being portrayed.

In [11]:
p[1988] = p[1988].options(width=370, height=300)
p[2017] = p[2017].options(width=310, height=300)
In [12]:
p_1000 = regrid(p, x_sampling=res, y_sampling=res)
p_1000
Out[12]:

Notice how fast it was to generate these plots that is because the resolution is low. Aggregation is by mean by default for gridded data, but there are other ways to aggregate. For instance, we can aggregate by the standard deviation of the grid cells that are aggregated into each pixel:

In [13]:
regrid(p, x_sampling=res, y_sampling=res, aggregator=rd.std()).relabel(f'Aggregated by std')
Out[13]:
In [14]:
# Exercise: Try to regrid using a different aggregator. Use tab completion to find other methods on `rd`.
In [15]:
# Exercise: Now using std, try changing the resolution.

Here, the std values are high when the displayed pixel includes grid cells with a wide range of different NDVI values, which is clearly true around the bounds of the lake in 2017.

Similar workflow in xarray

The above computations used Datashader, but we can accomplish a similar thing in xarray by grouping the values into bins based on the desired resolution and taking the mean on each of those bins.

In [16]:
res = 1e3
In [17]:
x = np.arange(ROI.x.min(), ROI.x.max(), res)
y = np.arange(ROI.y.min(), ROI.y.max(), res)

We'll use the left edge as the label for now.

In [18]:
da_1000 = (ROI
    .groupby_bins('x', x, labels=x[:-1]).mean(dim='x')
    .groupby_bins('y', y, labels=y[:-1]).mean(dim='y')
    .rename(x_bins='x',y_bins='y')
)
da_1000
Out[18]:
<xarray.DataArray (year: 2, y: 29, x: 29)>
dask.array<shape=(2, 29, 29), dtype=float64, chunksize=(1, 1, 1)>
Coordinates:
  * y        (y) float64 4.269e+06 4.27e+06 4.271e+06 ... 4.296e+06 4.297e+06
  * x        (x) float64 3.365e+05 3.375e+05 3.385e+05 ... 3.635e+05 3.645e+05
  * year     (year) int64 1988 2017
In [19]:
# Exercise: Try to aggregate using a method other than mean.

Compare

We can compare this to the results from using datashader regridding by getting the data from p_1000 and subtracting the nearest data from da_1000.

In [20]:
def get_data(p):
    df = p.dframe()
    pivotted = df.pivot(index='y', columns='x', values='value')
    stacked = pivotted.stack()
    return xr.DataArray.from_series(stacked)
In [21]:
(da_1000.sel(year=2017).reindex(get_data(p_1000[2017]).indexes, method='nearest') - 
 get_data(p_1000[2017])).hvplot('x','y')
Out[21]:
In [22]:
# Exercise: Why are the top and right edges particularly bad, and how would you fix that?

Handling bands with different resolutions

First we need to load the band-8 data. We'll grab it straight from s3. We will get the 8th band at the same path and row and same time. You can inspect data availability for this scan from landsatonaws.com .

In [23]:
da_8 = cat.amazon_landsat_band(product_id='LC08_L1TP_042033_20171022_20171107_01_T1', path=42, row=33, band=8).to_dask()

We can slice out our ROI, but we have to be careful to specify the max and min in the appropriate order for each coordinate (max:min for the y coordinates (as they are in decreasing order), min:max for the x coordinates (as they are in increasing order)).

In [24]:
ROI_8 = da_8.sel(x=slice(xmin, xmax), y=slice(ymax, ymin))
ROI_8 = ROI_8.drop('band').squeeze().persist()
ROI_8
Out[24]:
<xarray.DataArray (y: 2000, x: 2000)>
dask.array<shape=(2000, 2000), dtype=uint16, chunksize=(3, 189)>
Coordinates:
  * y        (y) float64 4.299e+06 4.299e+06 4.299e+06 ... 4.269e+06 4.269e+06
  * x        (x) float64 3.365e+05 3.365e+05 3.365e+05 ... 3.664e+05 3.664e+05
Attributes:
    transform:   (15.0, 0.0, 243292.5, 0.0, -15.0, 4425907.5)
    crs:         +init=epsg:32611
    res:         (15.0, 15.0)
    is_tiled:    1
    nodatavals:  (nan,)
In [25]:
p_8 = ROI_8.hvplot('x', 'y', width=500, height=400)
p_8
Out[25]:

Let's define a little helper function to determine the resolution of plots

In [26]:
def get_res(p, x='x', y='y'):
    df = p.dframe()
    pivotted = df.pivot(index=y, columns=x, values='value')
    stacked = pivotted.stack()
    da = xr.DataArray.from_series(stacked)
    print(f'{x} res:', np.unique(np.around(da[x].diff(x), 2)))
    print(f'{y} res:', np.unique(np.around(da[y].diff(y), 2)))
In [27]:
get_res(p_8)
x res: [15.]
y res: [15.]

We can use xarray to merge this band with the rest of our data and we will get a union of all the coordinates. In this case the shape expands to (1000, 1000) to (2000, 2000).

In [28]:
ds = xr.merge([{'NDVI': ROI, '2017_band_8': ROI_8}])
ds
Out[28]:
<xarray.Dataset>
Dimensions:      (x: 2000, y: 2000, year: 2)
Coordinates:
  * y            (y) float64 4.269e+06 4.269e+06 ... 4.299e+06 4.299e+06
  * x            (x) float64 3.365e+05 3.365e+05 ... 3.664e+05 3.664e+05
  * year         (year) int64 1988 2017
Data variables:
    NDVI         (year, y, x) float64 dask.array<shape=(2, 2000, 2000), chunksize=(1, 206, 385)>
    2017_band_8  (y, x) uint16 dask.array<shape=(2000, 2000), chunksize=(205, 189)>

All of our data our properly represented, but using methods like selecting the nearest value to a certain, x, y might yield nans:

In [29]:
ds.sel(x=x_center, y=y_center, method='nearest').compute()
Out[29]:
<xarray.Dataset>
Dimensions:      (year: 2)
Coordinates:
    y            float64 4.284e+06
    x            float64 3.514e+05
  * year         (year) int64 1988 2017
Data variables:
    NDVI         (year) float64 nan nan
    2017_band_8  uint16 6075

We can regrid band 8 to a 30m resolution (aggregate) or we can regrid the NDVI to a 15m resolution (interpolate); here let's aggregate:

In [30]:
res = 30
p_8_30 = regrid(p_8, x_sampling=res, y_sampling=res, width=500, height=400, 
                x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_8_30 
Out[30]:
In [31]:
get_res(p_8_30)
x res: [30.3]
y res: [30.3]

NOTE: x_sampling and y_sampling set the minimum allowable resolution, so the resolution of a given plot will not be exactly x_sampling and y_sampling unless it is sufficiently zoomed in.

In [32]:
# Exercise: Try zooming in and out in the plot to recompute for the new resolution.

Interpolation

Now let's quickly take a look at up-sampling. For this we will use regrid since up-sampling is not allowed in rasterize .

In [33]:
p_ndvi_15 = regrid(p, upsample=True, 
                   x_sampling=15, y_sampling=15, 
                   x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_ndvi_15
Out[33]:
In [34]:
get_res(p_ndvi_15[1988])
x res: [15.04]
y res: [15.04]

This doesn't look any more resolved than 30m, but that is because it is using nearest by default so the grid cells look the same size. The resolution becomes more apparent when using linear interpolation.

In [35]:
p_ndvi_15 = regrid(p, interpolation='linear', upsample=True,
                   x_sampling=15, y_sampling=15, 
                   x_range=(x_center-1e3, x_center+1e3), y_range=(y_center-1e3, y_center+1e3))
p_ndvi_15.relabel('Using linear interpolation')
Out[35]:
In [36]:
get_res(p_ndvi_15[1988])
x res: [15.04]
y res: [15.04]

Similar workflow in xarray

xarray supports a number of interpolations for up-sampling data. Here is what it takes to re-scale the ndvi images from res=30 to res=15 to match the pancromatic band. The options are nearest and linear with linear being selected by default.

NOTE: Interpolation is not supported on dask arrays, so we need to load that data into memory first. We'll use .load() for this.

In [37]:
ROI.load()
Out[37]:
<xarray.DataArray (year: 2, y: 1000, x: 1000)>
array([[[ 0.097074,  0.076331, ...,  0.083921,  0.085006],
        [ 0.110451,  0.034681, ...,  0.076006,  0.088437],
        ...,
        [ 0.120153,  0.158642, ..., -0.001621, -0.018514],
        [ 0.174352,  0.185494, ...,  0.007495, -0.009563]],

       [[ 0.193813,  0.179461, ...,  0.065202,  0.064832],
        [ 0.205616,  0.159787, ...,  0.069401,  0.068488],
        ...,
        [ 0.135962,  0.120389, ...,  0.101598,  0.10134 ],
        [ 0.110515,  0.103728, ...,  0.09309 ,  0.097967]]])
Coordinates:
  * x        (x) float64 3.365e+05 3.365e+05 3.365e+05 ... 3.664e+05 3.664e+05
  * y        (y) float64 4.269e+06 4.269e+06 4.269e+06 ... 4.299e+06 4.299e+06
  * year     (year) int64 1988 2017
In [38]:
ndvi_15 = ROI.interp_like(ROI_8)
ndvi_15
Out[38]:
<xarray.DataArray (year: 2, y: 2000, x: 2000)>
array([[[      nan,       nan, ...,       nan,       nan],
        [      nan,  0.174352, ..., -0.001034, -0.009563],
        ...,
        [      nan,  0.103762, ...,  0.083343,  0.086722],
        [      nan,  0.097074, ...,  0.084464,  0.085006]],

       [[      nan,       nan, ...,       nan,       nan],
        [      nan,  0.110515, ...,  0.095528,  0.097967],
        ...,
        [      nan,  0.199714, ...,  0.066981,  0.06666 ],
        [      nan,  0.193813, ...,  0.065017,  0.064832]]])
Coordinates:
  * year     (year) int64 1988 2017
  * y        (y) float64 4.299e+06 4.299e+06 4.299e+06 ... 4.269e+06 4.269e+06
  * x        (x) float64 3.365e+05 3.365e+05 3.365e+05 ... 3.664e+05 3.664e+05
In [39]:
# Exercise: Use hvplot to make a plot of these computed values

Next:

Now that you have learned how to regrid your data, you are likely ready to learn more about Machine Learning .